
S.2   Step-by-step discussion of an emulator for Space Invaders

   How I wrote a Space Invaders emulator
       An essay in Arcade Emulation
     Neil Bradley - neil@synthcom.com

   Before I get into this, I want to stress that the Space Invaders mentioned
   in this document is not yet available in EMU 2.0, but it will be after
   mid-January 1997. It is basically a step by step of how I wrote a Space
   Invaders emulator from start to finish.

S.2.1    My Background

   I spent 7 years at Intel corporation, most of that optimizing Intel based
   assembly language. I know many microprocessors and microcontrollers, and
   also have a hardware background. I also am fluent in C & C++ and have a
   good amount of PC hardware experience.

   I'm not trying to scare you off or brag at all, I'm just trying to explain
   where I am coming from when I say things. The "How did he know that?"
   question might come up during this document, and the answer most likely is
   "been there - done that". ;-)

S.2.2    Getting started

   The SI emulator was to be the basis for me writing a Z80 emulator (which
   as of this writing is still in progress) in assembly for all to consume.
   One of the best ways to write a processor emulator is to make it emulate
   code you know runs. No video games I know of come up and execute garbage
   code, so you're pretty well assured that using video game code as a test
   case is a pretty good idea in the development stage.

   I must stress that the most valuable resource to us is the internet. Learn
   all you can from all that is available. That is partially how I did the
   Z80 emulator. Take what others have to offer, learn from it, and in turn
   produce something else that others can benefit or learn from. 

   If you're not writing your own processor emulator, skip to section S.2.4,
   "Space Invaders Specifics".

   I had prior experience writing a 6502 emulator in assembly and knew what
   would work and what wouldn't. This time I wanted to write a general
   purpose extensible emulator that would be multi-processor aware, and would
   emulate as fast as possible. I also got beaten up for not "being portable".
   If you want portable, go grab any of the slower than desired CPU emulators
   available on the internet. If you want high performance on lower end
   machines, use assembly. Pick what architecture you're going to go with and
   do it! I also don't intend on getting into the C vs. Assembly argument.
   I'm getting 4X the performance AT A MINIMUM against various C emulators
   for the same CPU, so there is something to be said for assembly. The
   answer is written in stone for me. Judge for yourself. If you want it
   portable, you'll take a performance hit. Accept it now.

   I decided to first see what others had done with Z80 emulators, so I
   searched the web for "+z80 +emulator" and got a few links. It led me a few
   sites, so I downloaded Marat Fazyullin's Z80 emulator and xtrs (TRS-80
   emulator for Unix) and had a peek. It's always good to have more than one
   person's interpretation as to how an emulated processor should function,
   so find as many as you can when something doesn't make sense.

   I also purchased several Z80 books, two of which I found to be
   particularly useful. One is "How to program the Z80" by Rodnay Zaks and
   "Z80 Assembly Language Programming" by Peter W. Steele and Ivan Tomek.
   These are handy references. Rodnay's book as timing information, but is
   missing some instructions. It's also nice to have two or more books to
   check against each other, as often there are discrepencies. Both books
   mentioned above are out of print, but can usually be found at tech book
   stores.

   I wrote the basic "main loop" routine to basically fetch instructions and
   jump into a large jump table to each of the corresponding Z80 instruction.
   I also predefined what some of the registers would hold while the Z80 was
   emulating. HL is stored in BX, BC is stored in CX, DH Contains the Z80's
   flags, and DL contains the accumulator. ESI Is the source execution
   address, and EAX & EDI are used for general purpose computation throughout
   the emulator. The only time I ever used the high part of EBX or ECX was to
   quickly save the state of some registers, do some operations that could
   use them, and shift them back. Something like this:

	shl	edx, 16		; Save flags & accumulator for later
	[do work with dx here]
	shr	edx, 16		; Restore flags & accumulator

   It functions just like a push, but doesn't take up a memory cycle and
   doesn't create a cache hit.

   There are several things you need to keep in mind when writing a processor
   emulator:

   1) Get RID of CALL functions. They are very expensive. For example, the
      Z80 emulator has no calls. It's all handled by jumps. It might be
      convenient to have the different addressing modes in a nice callable
      table but your emulator will take a good hit when doing it this method.
      Consider using macros.
   2) Minimize jumps whenever possible.
   3) Keep the most commonly used virtual registers in native processor
      registers.
   4) Minimize memory accesses. These are killers.
   5) Keep total data & code accesses as far under 256K as possible. The
      smaller the total data & code accesses the better of a chance of
      fitting into the system's cache.
   6) Use macros to handle things like flags - not jump or call tables.
   7) Create a general purpose read memory/write memory (and read/write I/O
      if the processor you're emulating requires it). Use this when doing
      data access functions, but DO NOT use these to fetch instructions. Most
      of the time you'll be spending will be in emulating the actual
      instructions and not moving data around. 
   8) Don't branch if you don't have to. Keep the most commonly executed path
      one that doesn't take a conditional branch. This can cause performance
      hits as well.
   9) Make use of the instructions available to you. Even if you think you
      know a processor through and through, I would advise sitting down for
      quite some time studying its instruction set and taking advantage of
      every possible instruction you can.
   10) Use xchanges to temporarily save off registers you just aboslutely
       MUST use.
   11) For flags, use lookup tables if you can find a convenient way to look
       up add/subtract/dec/inc flags. I did this with the Half carry &
       overflow flags in the Z80 emulator.
   12) When using the Intel architecture, use the 486> instructions even if
       you're not doing 32 bit code (though I would recommend that you DO).
       You can still use the extended parts of registers even though you're
       in real mode.

   To anyone saying "I'd like my code to run on a 386 or lower", consider
   what you're saying. These are pretty weak machines, and even 486
   motherboards with CPU's these days are going for $60-$100 new. The extra
   constraint you'll put on yourself by not having the extended registers and
   32 bit addressing (and some 486 instructions) will cause you to spend more
   time trying to get good performance out of low end machines that would
   wind up hurting performance on a 486 or Pentium. Be careful, and consider
   the extra work for trying to go for a low-end system when the next to low
   end machines aren't that much more money.

   These are just some of the guidelines that have worked extremely well for
   me.

S.2.3    Disassemblers

   Another thing that is a life saver for processor emulation is a
   disassembler. Get your hands on a disassembler that you can trust. I found
   such a beast at Riddle's Roost (Sean Riddle's excellent Williams page) at
   http://www.ionet.net/~sriddle/willy.shtml. (that is spelled correctly).
   Sean has given us a freeware disassembler to use (thank you Sean!).

   I modified the disassembler to do two things - disassemble a single
   instruction and display registers on a single line. I displayed all the
   registers I was interested in, and a disassembly including the program
   counter and the actual bytes that were being disassembled. This is what
   you need for a simple debugger.

   Before I did all this, I allocated 64K and loaded in the ROMs at the
   appropriate places, and set the program counter to 0 (for Z80's). Then it
   went into my main loop.

   The main loop would allow me to single step through the execution, run
   with a disassembly, and run to a specific address. This way you can watch
   the code modify the register & see if it's right or not, or if it takes a
   wrong turn somewhere.

   I started off with no instructions emulated. I ran the Space Invaders code
   and let it run until it hit an invalid opcode. I implemented that
   instruction, ran the code to that point, recompiled, etc... and kept going
   until things were completely implemented. This was a good way for me to
   keep an eye on each instruction, and to at least semi-verify that they
   were working as I implemented them.

   This is important, because once you have the basic code to handle one type
   of an instruction (like ADD or SUB), implementing other similar
   instructions using different registers is easier, and most likely your
   code will be debugged before the other variants are added.

   I kept at it until the code was running well enough to keep running
   without hitting unimplemented instructions. Onto the game information...

S.2.4    Space Invaders Specifics

   The first thing I did was go looking for a memory map of this game. It
   turns out that I had gathered enough information just by watching what the
   code was doing to figure out where most everything was, but I wanted to
   see if things were where I thought they were. So I got ahold of Michael
   Adcock's Emulator How-To guide, and the Space Invaders memory map was
   there.

   It told me where the graphics RAM, ROM, and user RAM was, and a little bit
   of information about the I/O ports that were used. I also figured that SI
   used some form of interrupt because I did hit an "EI" instruction. On the
   Z80, you can either jump to a specific vector or have the hardware insert
   an instruction on the bus to execute when an interrupt happens. I didn't
   have schematics, but my guess was they were implementing an RST $xx
   instruction or something.

   Then I remembered - Space Invaders uses an 8080, where all interrupts go
   to 0008h and all NMI's go to 0010h. Sure enough, a quick disassembly of
   these addresses yielded intelligent code. ;-) So I threw down a magic
   interrupt to occur about every 20-30 milliseconds and now things started
   to happen.

   I didn't know the orientation of the graphics, but a few things I did know
   about graphics gave me some ideas. I knew that Space Invaders was black
   and white, so I figured 1 bit per pixel. I hooked any writes to Space
   Invaders video memory to call a function I had written to poke it directly
   into the monochrome card's memory.

   I got a blotchy image that looked as if my horizontal sync was off
   somehow. The width of the Space Invaders image turned out to be 20h bytes
   (32 * 8 pixles = 256 pixels). Once I mapped it to the video monitor, it
   displayed SI sideways! So I hacked together a routine to turn it from a
   horizontal image to a vertical image, and up came the game - until the
   invaders started to come. Space Invaders is 248 X 256 - bits packed
   vertically.

   I got one invader and that was it. After looking through all kinds of
   code, I came across a section that looped through 55 bytes checking to see
   if it was zero or non-zero. If it was zero, it would skip to the next
   byte. If it was non-zero, it would draw an alien. BTW, I modified the
   debugger to output to the monochrome card so I could watch what was going
   on. Also, there are 55 aliens in an invading fleet. ;-) That's where I
   made the connection.

   Something wasn't happening. I double checked the emulation of the
   instructions I had implemented, and everything looked in order (minus a
   few bugs and those didn't affect it). I hooked up another Z80 emulator to
   it and got the same results. At this point I wasn't debugging a problem
   with the CPU emulator itself - I was debugging a problem with game
   environment emulation.

   I then remembered that I hadn't implemented a periodic NMI, so I decided
   to hook it up with about 50 millisecond intervals. I reran it. Voila. The
   fleet came up and things worked great.

   I had also taken notes during the emulation process and discovered that
   several I/O addresses were being read. 02h, 03h, 04h, 05h, and 06h. 5 & 6
   were being written to but never read from. I took a look in the emulation
   how-to and found a section on the actual I/O addresses. I hooked the port
   values up to basic keys on the keyboard and was able to start playing the
   game.

   I don't have schematics, and to this day still haven't figured out the
   correct NMI/INT ratio values, but increasing interrupts causes the shots
   to move much much faster, and increasing NMI's causes the fleet to invade
   faster.

   I must admit Space Invaders is a pretty simple game to emulate. Not much
   to it, so if you're considering emulating a video game for the first time,
   keep some of these ideas in mind, and try 'em out. The only thing I'd do
   differently is if I had a reference machine or schematics to work against.
   When a particular I/O address or memory address is puzzling me, I check
   the address decoders on the schematics to try and find out where it's
   wired to. It's similar to reading a roadmap without any road names or
   town names on it, and filling it in as you go. Having a good command of
   how hardware works will help immensely.
